Investigating the Usefulness of Generalized Word Representations in SMT

نویسندگان

  • Nadir Durrani
  • Philipp Koehn
  • Helmut Schmid
  • Alexander M. Fraser
چکیده

We investigate the use of generalized representations (POS, morphological analysis and word clusters) in phrase-based models and the N-gram-based Operation Sequence Model (OSM). Our integration enables these models to learn richer lexical and reordering patterns, consider wider contextual information and generalize better in sparse data conditions. When interpolating generalized OSM models on the standard IWSLT and WMT tasks we observed improvements of up to +1.35 on the English-to-German task and +0.63 for the German-to-English task. Using automatically generated word classes in standard phrase-based models and the OSM models yields an average improvement of +0.80 across 8 language pairs on the IWSLT shared task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

On the Integral Representations of Generalized Relative Type and Generalized Relative Weak Type of Entire Functions

In this paper we wish to establish the integral representations of generalized relative type and generalized relative weak type as introduced by Datta et al [9]. We also investigate their equivalence relation under some certain conditions.

متن کامل

On generalized reduced representations of restricted Lie superalgebras in prime characteristic

Let $mathbb{F}$ be an algebraically closed field of prime characteristic $p>2$ and $(g, [p])$ a finite-dimensional restricted Lie superalgebra over $mathbb{F}$. It is showed that anyfinite-dimensional indecomposable $g$-module is a module for a finite-dimensional quotient of the universal enveloping superalgebra of $g$. These quotient superalgebras are called the generalized reduced enveloping ...

متن کامل

Improving Statistical Machine Translation Using Word Sense Disambiguation

We show for the first time that incorporating the predictions of a word sense disambiguation system within a typical phrase-based statistical machine translation (SMT) model consistently improves translation quality across all three different IWSLT ChineseEnglish test sets, as well as producing statistically significant improvements on the larger NIST Chinese-English MT task— and moreover never...

متن کامل

Usefulness of Serum NT-proBNP in Diagnosis of Generalized Seizures in Egyptian Children

  Background Seizures may occur in as many as 1% of children. The most urgent type of seizures is generalized tonic-clonic seizures (GTCS). N-terminal prohormone of brain natriuretic peptide (NT‐proBNP) has been considered as a promising biomarker in numerous acute illnesses. We aimed to evaluate usefulness of NT‐proBNP for diagnosis of g...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014